Thanks to everyone who attended my SharePoint Saturday presentation on BI with SharePoint 2010. Here's a link to my slide deck
Feel free to email me at kgoff@kevinsgoff.net.
KG
Thanks to everyone who attended my SharePoint Saturday presentation on BI with SharePoint 2010. Here's a link to my slide deck
Feel free to email me at kgoff@kevinsgoff.net.
KG
Posted at 07:59 AM in SQL Development, Speaking News | Permalink | Comments (0)
I'm now a "double-blogger"! J
I recently started a new blog called I-Data Weekly – it's sponsored by SetFocus, LLC. SetFocus is a Microsoft Gold Certified Partner for Learning Solutions. I've been working with SetFocus as a contract trainer and courseware contributor for SetFocus' Master's Program - Business Intelligence Track.
The intention of the blog is to show the type of content we cover in the actual BI Masters training program. We cover all the products in the Microsoft BI stack (SSIS, SSAS, SSRS, MDX programming, SharePoint, and PerformancePoint Server).
Each week I'll have a new technical article on the I-Data Weekly site. (Right now the articles are on a regular site – eventually they'll be part of an actual blog site). As articles go up, I'll post a link here on my blog. Right now I have two entries up there – the first is on OLAP/MDX named sets, and the second is on charting with PerformancePoint Server.
So feel free to check out the new blog and SetFocus' site – and if you're interested in BI training (or other training that SetFocus offers, such as .NET, SQL Server, etc.), then definitely contact them for more information! SetFocus also offers Online Training through their GRID system. SetFocus is a super company – very progressive, very talented and hard-working group of people. They also support the Northern New Jersey .NET and SQL Server User Groups (the monthly meetings are held at SetFocus' great site in Parsippany, NJ). I'm honored to be a part of their team.
KG
Posted at 08:43 PM in SQL Development | Permalink | Comments (1)
I heard one of my all-time favorite songs today on the radio: "Touch, Peel, and Stand", by Days of the New.
Then I was looking at some SQL 2005 code that recently came in handy for me at a client site – specifically, the ability to implement a TRY/CATCH block inside a stored procedure, and RAISE an error back to the calling application.
Now, the fact that you can insert TRY/CATCH code into SQL Server 2005 stored procedures isn't exactly "hot off the presses" news – most people have been aware of it for some time. Additionally, the ability to raise an error (using RAISERROR) goes back to SQL Server 2000.
But what's cool is that you can combine the two, to check for an error, grab information for that error, and then raise that error back to the application. So if you're also doing a TRY/CATCH in your application, you can access the error message that was raised in the SQL RAISERROR call.
So first, let's look at a very simple .NET call to a basic stored procedure using TRY/CATCH block. The stored proc will also have a TRY/CATCH block, and will raise any error back to the data provider, so that we can catch the error in .NET.
SqlConnectionStringBuilder osb = new
SqlConnectionStringBuilder();
osb.DataSource = "localHost";
osb.IntegratedSecurity = true;
osb.InitialCatalog = "AdventureWorks";
SqlConnection oConn = new
SqlConnection(osb.ConnectionString);
SqlCommand oCmd = new
SqlCommand("dbo.TestSPRaiseError", oConn);
oConn.Open();
try
{
oCmd.ExecuteNonQuery();
}
catch (Exception eErr)
{
// Either write out the error, or just display to the screen
MessageBox.Show("Error occured!\n\n" + eErr.Message.ToString());
}
Now, here's the SQL code, a test stored procedure that intentionally tries to violate a database constraint by deleting an order header ID for which line items exist.
CREATE
PROCEDURE dbo.TestSPRaiseError
AS
BEGIN
SET
NOCOUNT
ON
BEGIN
TRANSACTION
BEGIN
TRY
-- Try to do something that will generate an error
DELETE
FROM Purchasing.PurchaseOrderHeader WHERE PurchaseOrderID = 6
END
TRY
BEGIN
CATCH
DECLARE @errorseverity int, @errornumber int,
@errormessage nvarchar(4000), @errorstate int,
@ErrorLine int, @ErrorProc nvarchar(200)
-- Grab error information from SQL functions
SET @errorseverity =
ERROR_SEVERITY()
SET @errornumber =
ERROR_NUMBER()
SET @errorMessage =
ERROR_MESSAGE()
SET @errorstate =
ERROR_STATE()
SET @ErrorLine =
ERROR_LINE()
SET @ErrorProc =
ERROR_PROCEDURE()
-- Construct a detailed error message to send back
SET @ErrorMessage=
'Error '
+
CAST(@ErrorNumber AS
VARCHAR(10))
+
' in procedure: '
+ @ErrorProc +
' Line: '
+
CAST(@ErrorLine AS
VARCHAR(10))
+
' Error text: '
+ @ErrorMessage
-- Not all errors generate an error state, to set to 1 if it's zero
IF @ERRORSTATE = 0
SET @ERRORSTATE = 1
RAISERROR
(@errormessage , @ERRORSEVERITY, @ERRORSTATE, @ERRORNUMBER)
-- If the error renders the transaction as uncommittable, we must rollback
IF
XACT_STATE()
< 0
ROLLBACK
TRANSACTION
END
CATCH
COMMIT
TRANSACTION
END
GO
Again, nothing earth-shattering, but hopefully a nice-to-know. There are many ways you could construct the error message and then deal with it in the client piece.
I just added Touch, Peel, and Stand to my "Midnight Mix" on my MP3 player. It's 3 AM here on the east coast, but I guess that means it's midnight somewhere!
KG
Posted at 11:22 PM in SQL Development | Permalink | Comments (1)
A friend of mine told me yesterday that Joe Celko has a new SQL book: Joe Celko's Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL
That should give me something good to read in between the NCAA Conference Tournament games this weekend. J
KG
Posted at 02:06 AM in SQL Development | Permalink | Comments (0)
If you've ever written a database trigger in SQL Server, you know that the INSERTED/DELETED system tables are great for seeing the contents of rows before and after an update.
Unfortunately, as recently as SQL Server 2000, it wasn't easy getting information from INSERTED/DELETED rows outside of a trigger. As a matter of fact, INSERTED/DELETED tables were only visible inside a trigger. The best you could do was write code inside a trigger, to copy data from INSERTED/DELETED into temp tables, and then read the temp tables outside the trigger. Sure, you could make it work, but it was like eating broccoli – yuck.
(My apologies to all you broccoli-lovers out there)
Well, SQL Server 2005 introduced a new capability called OUTPUT, which you could append to an INSERT/UPDATE/DELETE statement. Essentially, OUTPUT allows you to specify the rows in the INSERTED/DELETED system tables – so you can actually issue an UPDATE statement, and also get values from the INSERTED/DELETED tables as a result set. Sound too good to be true? Well, check out this little sample:
-- Create a test table variable, add three rows to it, and then update the first two
DECLARE @tTest TABLE ( IDCol int identity,Amount decimal(10,2))
INSERT INTO @tTest VALUES (100)
INSERT INTO @tTest VALUES (200)
INSERT INTO @tTest VALUES (300)
UPDATE TOP(2) @tTest SET Amount = Amount * 10
OUTPUT INSERTED.IDCol, DELETED.Amount as OldAmount, Inserted.Amount as NewAmount
-- will yield the following:
IDCol OldAmount NewAmount
1 100.00 1000.00
2 200.00 2000.00
All I've done in the sample above is add an OUTPUT statement to the end of the UPDATE statement, and then I specified the columns. Obviously this is a simple example – but if the table contained calculated columns, or columns affected by a trigger, the OUTPUT of any columns from the INSERTED table would reflect those calculations.
Pretty cool, eh? This means no more round trips back to the server (or at least no more additional SQL statements) to retrieve what changed.
Now, there's one more thing – suppose you wanted to direct the values from INSERTED/DELETED into a table variable. You could the following, to redirect the OUTPUT to a table variable. (This way, I can use OUTPUT for multiple tables)
DECLARE @tResults TABLE (IDCol int, OldAmount decimal(10,2), NewAmount decimal(10,2))
UPDATE TOP(2) @tTest
SET Amount = Amount * 10
OUTPUT INSERTED.IDCOL, DELETED.AMOUNT, INSERTED.AMOUNT INTO @tResults
SELECT * FROM @tResults
KG
Posted at 06:48 PM in SQL Development | Permalink | Comments (0)
Well, between watching college basketball and teaching and writing my next CoDe article, I've ignored my blog over the last month.
I have a community presentation that I've done over the last year and a half – it covers ASP.NET AJAX and SQL 2005. It's based on an actual client application that runs against a database with about a million rows.
Here's the "end result" . This is a bare-boned version, mainly intended to demonstrate fundamentals The client version, which I'm not allowed to show, is much more "pimped out" J
The presentation focuses on building a stored procedure in SQL 2005 that can handle the following:
OK, seems pretty simple? Well, it's amazing how a seemingly simple module can contain many fundamentals!
I've been in a debate with others about use of stored procedures for a task like this. I advocate stored procs as a starting point, unless there are critical reasons not to. Given that there are a good handful of variables here, some will choose the path of dynamic SQL, of constructing a SQL expression. Well, that's fine – however, it CAN be done with a stored procedure. Through the use of CASE statements, COALESCE statements to deal with NULL parameters, one can build a stored proc with the same flexibility. Bottom line: where possible, I'll always opt for creating a database API for tasks, and set up a basic test environment (which I'll show further down).
So here's the stored procedure. It receives the text search parameters, along with a startrowIndex (that's managed by the web application), the sort column to be used, as well as an optional "jump-to" character.
Also note there's an XML parameter, for the list of status codes. Sure, this particular situation only has a handful of status codes….but you could have other instances of variable selections where the list might be much larger (say a user selects 20 customers, or an even greater number of products). SQL 2005's XML capabilities are perfect for something like this – you can pass an XML string as a parameter, and shred it into a table variable back in SQL Server. (I've included a reusable UDF for this at the end, it's called XMLToTable, which you'll see used in the following stored proc):
ALTER PROCEDURE [dbo].[LookupEmployeeData]
@LastName varchar(50)=null, @FirstName varchar(50)=null,
@Address varchar(50)=null, @City varchar(50)=null,
@State varchar(2)=null, @Zip varchar(50)=null,
@StartRowIndex int, @MaxRows int,
@AlphaChar varchar(1)=null, @SortCol varchar(20)=null
@XMLStatuses XML
AS
BEGIN
-- StartRowIndex represents the first row to grab, based on current sort
-- MaxRows represents the max # of rows to return
-- alphaChar represents a "jump-to" character, based on current sort
-- (if an alphaChar is specified,
-- then the StartRowIndex is ignored
-- So someone might be sorting on City, and wants to jump straight to the 'R' cities
-- JumpChar is to give user an easier way than click NEXT…NEXT…NEXT…NEXT
-- All search parms are optional
SET NOCOUNT ON
DECLARE @lJumpToChar bit
-- If user did not specify a "jump-to" alphanumeric char, then
-- we're doing straight paging, based on the startrow Index
IF @AlphaChar is null
SET @lJumpToChar = 0
ELSE
-- If user DID specify a "jump-to" alphanumeric char, then
-- we're going straight to the first occurance of the alphachar,
-- based on the current sort
SET @lJumpToChar = 1
-- Query into CTE for 'base' results, and assign a row #, based on the ORDER
WITH CustListTemp AS
(SELECT CustomerID, LastName, FirstName, Address, City, State,
Zip, ROW_NUMBER() OVER (ORDER BY
CASE @SortCol
WHEN 'LASTNAME' THEN LastName + Firstname
WHEN 'ADDRESS' THEN Address
WHEN 'CITY' THEN City + LastName + Firstname
WHEN 'STATE' THEN STATE + LastName + Firstname
WHEN 'ZIP' THEN ZIP + LastName + Firstname
ELSE LastName + Firstname END)
AS RowNum
FROM Customers
-- Use the Table-valued UDF XMLToTable to shred an XML string
-- into a table variable, for querying against a set of values
JOIN dbo.XMLToTable(@XMLStatuses) StatList
ON StatList.IntPk = Customers.StatusFK
-- Check all the text searches…use COALESCE (or ISNULL), since some
-- could be NULL
WHERE LastName LIKE '%' + COALESCE(@LastName,LastName)+ '%' AND
FirstName LIKE '%' + COALESCE(@FirstName,FirstName) + '%' AND
Address LIKE '%' + COALESCE(@Address,Address) + '%' AND
City LIKE '%' + COALESCE(@City,City) + '%' AND
State LIKE '%' + COALESCE(@State,State) + '%' AND
Zip LIKE '%' + COALESCE(@Zip,Zip) + '%' )
-- Now query the CTE, for the TOP (N) rows,
-- based on either row number position (paging)
-- or the JumpChar
SELECT TOP (@MaxRows) CustomerID, LastName, FirstName,
Address, City, State, Zip, RowNum
FROM
( SELECT CustListTemp.*,
(SELECT COUNT(*) from CustListTemp) AS RecCount FROM CustListTemp ) CustList
WHERE
CASE
-- If the JumpChar is true, check the first character
-- of the sort order column against the AlphaChar
WHEN @lJumpToChar = 1 AND @SortCol= 'LASTNAME' AND
SUBSTRING(LastName,1,1) >= RTRIM(@AlphaChar) THEN 1
WHEN @lJumpToChar = 1 AND @SortCol= 'ADDRESS' AND
SUBSTRING(Address,1,1) >= RTRIM(@AlphaChar) THEN 1
WHEN @lJumpToChar = 1 AND @SortCol= 'CITY' AND
SUBSTRING(City,1,1) >= RTRIM(@AlphaChar) THEN 1
WHEN @lJumpToChar = 1 AND @SortCol= 'STATE' AND
SUBSTRING(State,1,1) >= RTRIM(@AlphaChar) THEN 1
WHEN @lJumpToChar = 1 AND @SortCol= 'ZIP' AND
SUBSTRING(Zip,1,1) >= RTRIM(@AlphaChar) THEN 1
-- If JumpChar is false, then check the row number
-- for the valid range
-- Note that if StartRowIndex is -1, that means user
-- wants the last page…in that case, count backwards from MaxRows
WHEN @lJumpToChar = 0 AND RowNum BETWEEN
(CASE @StartRowIndex
WHEN -1 THEN ( RecCount ) - @MaxRows
ELSE @StartRowIndex
END )
AND
( CASE @StartRowIndex
WHEN -1 then ( RecCount )- @MaxRows
ELSE @StartRowIndex
END) + @MaxRows
THEN 1
ELSE 0 END = 1
END
GO
And I promised the code for the reusable Table-valued UDF, to "shred" an XML string into a table variable (for subsequent joining). It uses the new XQUERY capabilities in 2005 to read nodes from an XML string. With this, I can handle any situation where a user passes a variable number of selections. Here it is:
ALTER FUNCTION [dbo].[XMLtoTable]
(@XMLString XML )
RETURNS
@tPKList TABLE ( IntPK int )
AS
BEGIN
INSERT INTO @tPKList
SELECT Tbl.col.value('.','int') as IntPK
FROM @XMLString.nodes('//IDpk' ) Tbl(col)
RETURN
END
So, that's the stored procedure. Now, I always like to test my stored proc inside SQL Management Studio – here's such a test.
So, that's it. Most people who have seen this, and/or have attended the presentation, like the idea. I know that some folks have taken the code and have implemented it for their own situations. Of course, some who eschew stored procedures will think I'm smoking crack. Hey, that's fine.
KG
Posted at 12:49 AM in SQL Development | Permalink | Comments (2)
Over the last week, I've been talking to a company that builds solutions for public health. I'm going to be doing some work for them. They are as firm as can be about stored procedures. Always glad to see that!
No jazz tonight, bad bad headache.
KG
Posted at 11:13 PM in SQL Development | Permalink | Comments (0)
I've been involved in some recent discussions about using SQL Server stored procedures for database updates. Some individuals believe that stored procedures are not as flexible as other approaches, for updating back-end tables. I've heard different explanations for these beliefs, such as problems dealing with NULLs, and only updating columns that have changed.
A year ago, I wrote my own stored procedure generator for all of my INSERT/UPDATE/DELETE procedures. So I decided to review the assertions about stored procedures, and what I could add to my generator to handle different types of situations. Here were some of the issues/scenarios in question (whittled down to the basics):
Before I continue, some people have asked me about #2. Believe me, I avoid NULL values whenever I can. However, sometimes they're necessary. The deal of the survey application is one example. Here's another: think of an insurance application that ties individual accounting transactions to a particular policy. In some instances, a transaction might be shifted from policy 'A' to another policy, but not right away. So the related policy foreign key might need to be set to NULL for a period of time, until the client determines the correct foreign key. Bottom line: sometimes it's necessary to set values from non-null to NULL.
So our standard UPDATE stored procedure needs to handle both of these situations. Handling the first one (by itself) isn't all that tricky – we can define each stored procedure parameter with a default value of NULL, and then check for a NULL value in each UPDATE…SET statement to determine whether to update based on the parameter, or just the value of the existing column.
For a simple example, suppose we have a table called NameTest, with a column for each common datatype: PK (int), FullName (Char), LastEvalDate (DateTime,) Salary (Decimal), and Interviewed (bit). Our update procedure would look something like this:
ALTER PROCEDURE [dbo].[UpdateData]
@PK int, @FullName char(100)=null, @LastEvalDate DateTime=null,
@Salary Decimal(14,0)=null, @Interviewed bit=null
AS
BEGIN
SET NOCOUNT ON;
UPDATE NameTest
SET FullName = COALESCE(@FullName, FullName),
LastEvalDate=COALESCE(@LastEvalDate, LastEvalDate),
Salary = COALESCE(@Salary, Salary),
Interviewed=COALESCE(@Interviewed, Interviewed)
WHERE PK = @PK
END
If you KNOW that you'll NEVER need to set a non-null value to NULL, then the code above should be fine. If you only need to update the Salary column, you can just pass the PK value and Salary value as parameters – all the others will be defaulted to NULL, and those columns will simply be set to themselves, through use of COALESCE (which behaves similar to ISNULL).
However, it gets a little complicated to also address rule #2 above. If you need to NULL out entries, then the above code won't work. Why? Because if we explicitly pass a NULL value as a parameter, the COALESCE will kick in, in the same way it does when the parameter is defaulted to null. As a result, if we try to use the code above to pass a NULL value to the @Salary parameter, the UPDATE statement will simply set the column value back to itself, instead of setting it to NULL.
So we're in a bit of a Catch-22 here…sometimes the code will encounter a NULL value (an implicit NULL), because we didn't pass the parameter, and so we want the database column value to stay the same. And then sometimes the code will encounter a NULL value (an explicit NULL), because we want to null out the column, so the database column value should be set to NULL.
Unfortunately, inside the stored procedure, there doesn't seem to be any way to distinguish between an implicit NULL (because the parameter wasn't set, and therefore defaulted to NULL) and an explicit NULL. We're going with a mindset that "not all NULLs" are created equal, but the stored proc is blind to that.
Now, some developers get around this by passing an additional Boolean parameter for each column, to tell the update stored procedure that the NULL value for a specific column is an explicit one. While this will work, it can wind up doubling the number of parameters, and means that a developer must pass a true value for each column that needs to be NULLED out.
So if we're not going to use a 2nd set of parameters, what then? Fortunately, there's an answer. The answer is a bit of a kludge, but a kludge that's contained within the stored proc, and won't require additional parameters. Instead of defaulting each parameter to NULL, we'll default each parameter to a "token" value that would never be used otherwise (something like "-999999" for character types, -999999 for numerics, '1-1-1700' for dates…bits need special handling, which I'll cover in a moment). Sure, "-999999" is kludgy, but it's being generated, and it only used inside the stored proc.
So if someone truly passes in a NULL value, we know in the code that the parameter has a NULL value – and if someone doesn't specify the parameter, it will be defaulted to the token (e.g. "-999999") for that data type. The code is as follows:
ALTER PROCEDURE [dbo].[UpdateData]
@PK int, @FullName char(100)='-999999',
@LastEvalDate DateTime = '01-01-1800',
@Salary Decimal(14,0) = -999999,
@Interviewed int = -99
AS
BEGIN
SET NOCOUNT ON;
UPDATE NameTest
SET FullName = CASE
WHEN @FullName = '-999999'
THEN FullName
ELSE @FullName END,
LastEvalDate = CASE
WHEN @LastEvalDate = '01-01-1800'
THEN LastEvaldate
ELSE @LastEvalDate END,
Salary = CASE
WHEN @Salary= -999999
THEN Salary
ELSE @Salary END,
Interviewed = CASE
WHEN @Interviewed = -99
THEN Interviewed
ELSE CAST(@Interviewed AS BIT) END
OUTPUT Inserted.*
WHERE PK = @PK
END
Note that the bit column for "Interviewed" is declared as an integer parameter. This is because a bit can only be 0, 1, or NULL. There's no other token value that we can set. So we'll define it as an integer parameter with a token default value of -99, and then CAST it as a BIT if the value is not the token value.
OK…you may be wondering…does this mean that developers need to pass an integer instead of a bit, when they're dealing with a bit column? Fortunately, no. A developer can still pass a Boolean true/false to an integer parameter.
Before we move on, one other thing – the UPDATE proc has a clause that reads "OUTPUT Inserted.*". SQL 2005 allows you to OUTPUT the values in the INSERTED system table (which was previously only available inside a trigger) as part of an INSERT or UPDATE statement. This allows you to immediately return the row (as it exists in SQL 'after' the UPDATE), without needing to do another query. This is a really nice new feature in T-SQL 05.
So now we have the standard for an UPDATE stored procedure that handles both a variable number of parameters, as well as explicit NULL handling. Let's look at some .NET code for calling this stored procedure:
SimpleDataAccess oData = new SimpleDataAccess();
List<SqlParameter> oList = new List<SqlParameter>();
int PK = 1;
string FullName = "Kevin S. Goff";
decimal Salary = 100000;
bool Interviewed = true;
DateTime LastEvalDate = DateTime.Today;
oList.Add(new SqlParameter("@PK",PK));
oList.Add(new SqlParameter("@FullName", FullName));
oList.Add(new SqlParameter("@Salary",Salary));
oList.Add(new SqlParameter("@Interviewed", Interviewed));
oList.Add(new SqlParameter("@LastEvalDate", LastEvalDate));
DataSet dsReturn = oData.RunStoredProc("dbo.UpdateData",oList);
Of course, that's an example of updating every column. Suppose we used the RowVersion capabilities in ADO.NET and determined that all the user did was set the LastEvalDate to something, and the interviewed bit to NULL. You'd only pass the following parameters:
oList.Add(new SqlParameter("@PK",PK));
oList.Add(new SqlParameter("@Interviewed", System.DBNull.Value));
oList.Add(new SqlParameter("@LastEvalDate", LastEvalDate));
DataSet dsReturn = oData.RunStoredProc("dbo.UpdateData",oList);
So that's it! Of course, nothing ever seems to be perfect. Aside from the kludgy token concept (which doesn't bother me too much, since the code is generated and self-contained), the only other thing I'd eventually like to address is columns being set to themselves when the parameter wasn't passed. CASE statements don't seem to work outside a SET statement (at least not that I can see), so it would appear I'd have to resort to dynamic SQL inside the stored procedure, to construct the UPDATE statement manually. While I'm not 100% opposed to dynamic SQL, I always try to look for other approaches first.
Additionally, I'm not positive if the transaction log gets hit when a column is set to itself (definitely something worth investigating – it would be great if SQL has the smarts not to write to the log if a column value hasn't actually changed). I've talked to DBAs who have said they don't care about this, but I'm still curious myself.
Final note: if you write UPDATE triggers, keep in mind that the UPDATED function will return a true for any column that's specified in an UPDATE, even if the column value itself isn't changed. That's why triggers should always compare INSERTED.<column> and DELETED.<column> to see if something has actually changed.
So….what do we have…the means to use stored procedures for NULL handling, and to only pass up the columns that have changed.
At some point, I'm going to publish the .NET code for generating stored procs (it's not rocket science, fortunately).
Time for some Miles Davis…
KG
Posted at 11:16 PM in SQL Development | Permalink | Comments (0)
True or False: "Solving problems that are hierarchical and nested in nature are best helped with a record oriented approach rather than a SET ORIENTED approach"
This assertion comes from the Visual FoxPro environment, where developers can navigate through result sets (VFP cursors…not to be confused with SQL Server Cursors) at the record level, very quickly and easily. No question, VFP is a great database tool. So great in fact, that some have used it for so long, and to solve so many problems, that it's easy to conclude that VFP's offerings are the only optimal hammer in the toolbox for certain operations.
SQL Server 2005 conforms more closely to the ANSI-99 SQL spec – and some of the features of the ANSI-99 standard making querying more flexible than before. One such feature is Common Table Expressions and recursive querying of hierarchical data. I'm going to devote some blog entries over the next few weeks to recursive querying, as I believe developers regularly face this challenge.
I'm going to start with a very simple example, to demonstrate the mechanics of recursive querying in SQL 2005. First, here's a simple set of hierarchical data – a product hierarchy with Brands, Groups, and Items. Each row contains a unique integer identifier, a description, and a reference to the row's parent:
DECLARE @tProducts TABLE (ID int, Name char(50), ParentID int)
INSERT INTO @tProducts VALUES (1, 'Brand 1', null)
INSERT INTO @tProducts VALUES (2, 'Brand 2', null)
INSERT INTO @tProducts VALUES (3, 'Brand 3', null)
INSERT INTO @tProducts VALUES (6, 'Brand 1, Group 1', 1)
INSERT INTO @tProducts VALUES (7, 'Brand 1, Group 2', 1)
INSERT INTO @tProducts VALUES (8, 'Brand 1, Group 3', 1)
INSERT INTO @tProducts VALUES ( 9, 'Brand 2, Group 1', 2)
INSERT INTO @tProducts VALUES (10, 'Brand 2, Group 2', 2)
INSERT INTO @tProducts VALUES ( 11, 'Brand 3, Group 3', 3)
INSERT INTO @tProducts VALUES (12, 'Brand 1, Group 1, Item 1', 6)
INSERT INTO @tProducts VALUES (13, 'Brand 1, Group 1, Item 2', 6)
INSERT INTO @tProducts
VALUES (14, 'Brand 1, Group 1, Item 1, SKU 1', 12)
Here's the challenge: for any specific item in this table, I want to know all the child records, going all the way down the hierarchy. Think about how you'd handle this query in SQL 2000 using SQL statements. While it was doable, querying hierarchical data was never an easy task. Fortunately, SQL 2005 makes this much simpler. Here's the entire query, which sets a search string, and then retrieves all the child records for that search string:
DECLARE @cSearch char(50)
SET @cSearch = 'Brand 1, Group 1'
; WITH ProductCTE AS
-- Anchor query
(SELECT ID, Name, ParentID
FROM @tProducts
WHERE Name = @cSearch
UNION ALL
-- Recursive query
SELECT Prod.ID, Prod.Name, Prod.ParentID
FROM @tProducts Prod
INNER JOIN ProductCTE
ON ProductCTE.ID = Prod.parentID )
SELECT * FROM ProductCTE
I'll confess, when I first looked at CTE and recursive query syntax, I felt pretty stupid that I couldn't grasp it. So, I took an online example, completely changed the sample data, built my own example, and then I didn't feel quite so stupid. ;)
Here's the deal/trick with recursive queries - in a "normal" SQL SELECT statement, you're querying from a table that isn't changing during the lifetime of the query (OK, data could be inserted from another session, but that's a different topic). You're querying directly FROM one table, and into a totally separate result set.
In a recursive query, some of the data you need to query is actually a result of a prior phase of the query. So results become the source for future results. Think about this with respect to our simple example, where we want all the children of a particular row – we can't pull the lowest level (SKU) without pulling the level above that (Item), and we can't do that without pulling the level above that (Group) and so on.
So let's take a look at the anatomy of the query, which has three parts. The first part is defining the Common Table Expression…
; WITH ProductCTE AS
As I said in my last blog post, a CTE is like a temporary derived table or view, with a very short lifespan. What we're going to do in the next two steps is query into ProductCTE for the search string, and then continue to query all the way down the hierarchy by comparing IDs to ParentIDs.
The second part is called the main or anchor query:
(SELECT ID, Name, ParentID
FROM @tProducts
WHERE Name = @cSearch
The anchor query executes first, so we know that ProductCTE will contain the ID, Name, and ParentID for the single row that matches the search condition.
The final part is the actual recursive query, which is connected to the anchor query with a UNION ALL:
UNION ALL
-- Recursive query
SELECT Prod.ID, Prod.Name, Prod.ParentID
FROM @tProducts Prod
INNER JOIN ProductCTE
ON ProductCTE.ID = Prod.ParentID )
Note that the anchor query is performing a join between the original list of products and the ProductCTE. So each "hit" between an ID and a ParentID will go into ProductCTE, and the query will continue until no more matches are found.
OK, so that handles querying DOWNWARD….suppose we wanted to find all the parent rows instead? Just switch the column names on the INNER JOIN in the recursive query:
ON ProductCTE.ParentID = Prod.ID )
If you're wondering about any limit to the number of recursions, the default is 100, which can be configured: check out MAXRECURSION in the online help if you ever have a hierarchy with more than 100 levels. (I'm almost afraid to see a hierarchy with that many levels!)
See, that wasn't so bad? There are many examples of hierarchical data out there, such as Bill of Material data, organization chart data, etc. Anyone who has ever worked with e-commerce applications is likely VERY familiar with the challenges of hierarchical data. SQL Server 2005 makes life much easier to deal with these challenges.
More on this later…KG
Posted at 10:55 PM in SQL Development | Permalink | Comments (1)
When it comes to retrieving data, I'm a firm believer in using SQL/T-SQL and stored procedures as much as possible.
I've been debating someone on an online forum about the use of SQL/T-SQL syntax to handle retrieval requirements that don't appear to be SET-oriented. The person I'm debating proposed the following challenge as an example where the SQL language is allegedly not very efficient:
The example is a client with multiple invoices, where you need to determine the specific invoice that exceeds a specific cumulative balance. Here was an example of a post-query "data munging" approach in the database tool Visual Foxpro:
SEEK nDebtor
nDebt = 0
SCAN WHILE Debtor = nDebtor AND nDebt < 1000
nDebt = nDebt + DebtAmt
ENDSCAN
Of course, this isn't a particularly challenging task, but it is an example to demonstrate some of the great features in T-SQL 2005: in this case, Common Table Expressions and the ability to APPLY a table-valued UDF directly to a SQL SELECT query.
First, let's create some data:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Invoice]') AND type in (N'U'))
DROP TABLE [dbo].[Invoice]
GO
CREATE TABLE Invoice (InvPK int identity,InvoiceNum Char(20),
DebtorID int, InvoiceDate DateTime,
InvoiceAmount decimal(14,2))
INSERT INTO Invoice
VALUES ('111',1,CAST('1-1-2007' AS DateTime), 500.00)
INSERT INTO Invoice
VALUES ('222',1,CAST('1-6-2007' AS DateTime), 200.00)
INSERT INTO Invoice
VALUES ('333',1,CAST('2-13-2007' AS DateTime),350.00)
INSERT INTO Invoice
VALUES ('444',1,CAST('2-14-2007' AS DateTime),350.00)
Second, let's write a table-valued UDF that returns the cumulative sum of invoices for a customer, based on a current date. The final query will utilize this table-valued UDF to determine the cumulative balance for each date for which an invoice exists, and see if the cumulative balance exceeds a specific threshold.
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[GetExceedingInvoice]')
AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION [dbo].[GetExceedingInvoice]
GO
CREATE FUNCTION dbo.GetExceedingInvoice
(@DebtorID INT,
@CutOffDate DATETIME)
RETURNS @tMaxInvoice TABLE
(InvoiceDate DATETIME,InvoiceAmount NUMERIC(14,2))
AS
BEGIN
INSERT INTO @tMaxInvoice
SELECT MAX(InvoiceDate) AS InvoiceDate,
SUM(InvoiceAmount) AS InvoiceAmount
FROM Invoice
WHERE DebtorID = @DebtorID AND
InvoiceDate <= @CutOffDate
RETURN
END
GO
Because we're writing this as a table-valued UDF, we can utilize it from any other query/stored procedure in the database.
Finally, we can build the body of the query. We'll start by setting some parameters for the Customer (Debtor) and the threshold:
DECLARE @DebtorID int
DECLARE @Threshold decimal (14,2)
SET @DebtorID = 1
SET @Threshold = 1000.00
Next we'll build a CTE (common table expression) to retrieve and store the invoices for the specific debtor. If you're not familiar with CTEs, you can think of them as a temporary dynamic view or a derived table, but one that only stays in scope for the next SQL statement.
; WITH InvCTE as (
SELECT TOP 100 PERCENT * FROM Invoice
WHERE DebtorID = @DebtorID ORDER BY InvoiceDate )
-- you need want to an ORDER BY
-- in a derived table, subquery, CTE,
-- you need to use TOP 100 Percent
Last but certainly not least, the next statement will take the CTE (InvCTE), and APPLY the table-valued UDF (GetExceedingInvoice) to determine the specific invoice and invoice date (if any) that exceeds the cumulative balance to date:
SELECT TOP 1
InvCTE.InvoiceNum,MaxInv.InvoiceDate,MaxInv.InvoiceAmount
FROM InvCTE
CROSS APPLY
dbo.GetExceedingInvoice(@DebtorID,InvoiceDate) MaxInv
WHERE MaxInv.InvoiceAmount >= @Threshold
AND DebtorID = @DebtorID
This will return the invoice that exceeded the threshold, along with the invoice date and the sum of customer invoices to that point. There are surely several ways to do this: this was a full first pass.
In conclusion, there may be data retrieval requirements that initially "appear" to be beyond what T-SQL can handle with a SET-oriented approach. Sometimes it may seem easiest to just pull some of the data into an application record set and munge through it sequentially. But hopefully this little exercise demonstrates that T-SQL 2005 does indeed provide the necessary functions to do it all in stored procedures using a SET-based approach.
Time for some beer and ice cream…KG
Posted at 08:39 PM in SQL Development | Permalink | Comments (3)
